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The performance of top taggers, for example in resonance searches, can be significantly enhanced 
through an increased set of variables, with a special focus on final-state radiation. We study the 
production and the decay of a heavy gauge boson in the upcoming LHC run. For constant signal 
efficiency, the multivariate analysis achieves an increased background rejection by up to a factor 30 
compared to our previous tagger. Based on this study and the documentation in the Appendix we 
release a new HEPTopTagger2 for the upcoming LHC run. It now includes an optimal choice of the 
size of the fat jet, N-subjettiness, and different modes of Qjets. 
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I. INTRODUCTION 

After the discovery of the Higgs boson, a keystone of the Standard Model, one main task for the upcoming 
LHC runs will be searches for physics beyond the Standard Model. Several open experimental and theoretical 
questions point to additional particles or structures at energies above the electroweak energy scale [T]. A 
very generic feature of many extensions of the Standard Model is the presence of additional heavy particles 
which preferentially decay to a pair of top quarks [2]. One example for such a resonance could be a heavy 
neutral Z'-gauge boson with a TeV-scale mass. Historically, such states were only searched for using semi- 
leptonically decaying top pairs. There, a kinematic reconstruction is based on an approximate reconstruction 
of the missing neutrino momentum through a LL-mass or top mass condition. In the last LHC run this search 
channel was supplemented by resonance searches based on boosted, hadronically decaying top pairs. In the 
corresponding ATLAS analysis [3] the HEPTopTagger [H and the template tagger [5] each showed a 
similar reach, comparable with the semileptonic channel. This experimental success is based on rapid progress 
in the held of just substructure both experimentally and theoretically, which will gain even more momentum 
during the 13 TeV LHC run. 

The held of top and Higgs tagging [7] started essentially as a Gedankenexperiment to illustrate recom¬ 
bination jet algorithms [5]. After some early attempts for example to tag hadronically decaying tops [3] it 
took off with the development of the BDRS Higgs tagger with its mass drop condition [10] and a hltering 
step targeting underlying event and pile-up m- The hrst top taggers were simple, deterministic algorithms 
which could identify and reconstruct hadronically decaying top quarks including subjet 6-tagging IldMIj- 
They were based on deliberately simple structures and algorithms, to hrmly establish subjet methods in AT¬ 
LAS and CMS. After the experimental success of these completely new analysis tools in the hrst run of the 
LHC, the upcoming run will beneht from more advanced top tagging methods. Those include multivariate 
taggers [H], template taggers |5], as well as shower deconstruction [inj or event deconstruction [I7j*. For 
those specialized tools the challenge will be to still provide a universal top tagging approach, which on the 
one hand allows for optimal experimental results, but on the other hand identihes and reconstructs boosted 
top quarks independent of the specialized analysis framework. 

Over time, the original HEPTopTagger |3] has gone through several rounds of improvements. The 
hrst modihcation included a re-formulation of the algorithm, leading to the trademark A-shaped kinematic 
cuts |5] . One of the key observations leading to these cuts is that in the absence of a 6-tag it is not helpful to 
uniquely identify the two kF-decay jets because in typical top decays there will be two jet-jet combinations 
which reconstruct to an invariant mass around 80 GeV [19]. The hrst set of new, additional variables [20] 
then included a combination of the usual hltered top mass m with a pruned top mass [2T] . In this upgrade 
we introduce a fat jet radius up to i? = 1.8 for moderately boosted tops and allow for a choice of Cambridge- 
Aachen [32] and At [331123] jet algorithms in all internal clustering and hltering steps except for the mass drop 
condition. This improves the tagging performance for highly boosted tops [20]. Recently, the algorithm was 
slightly changed to avoid background shaping m- In the same study we added a low-pT rnode based on Fox- 
Wolfram moments [2S] to incorporate angular correlations, extending the tagging coverage to pT,t = 150 GeV. 

In this paper we present a detailed study of the HEPTopTagger2, collecting all previous modihcations, 
as well as a whole range of new features targeted at multivariate analyses and statistical approaches to single 
events [23137]. The main body of the paper will focus on Z' searches, where hnal-state jet radiation turns 
out to be the limiting factor of the original tagger. After resolving the issue with hnal-state radiation we 
will step by step improve the tagging algorithm by dehning and including additional kinematic information. 
Finally, we will compare the multivariate tagging performance with the leading projections based on event 
deconstruction m 

The main background in fully hadronic Z' —> tt searches is QCD multi-jets production, which allows us to 
directly translate all our hndings into a performance study based on tagging tt pairs in the Standard Model. 
We will show these results together with a review of the complete HEPTopTagger2 algorithm and the code 
interface in the Appendix. 


Why a kinematic selection as naive as ‘top buckets ’ m also seems to work is beyond the comprehension of the authors. 
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II. RESONANCE RECONSTRUCTION 

The key challenge of any top tagger is its broad range of applications and the related optimization of the 
algorithms and codes. For example, the HEPTopTagger was developed to solve the combinatorial problems 
in ttH searches [1]. The first public tagging code was presented for supersymmetric top partner searches 
in semi-leptonic top decays [S]. Its proposed applications include single top production to experimentally 
separate the s-channel and t-channel production processes [28] . However, its experimental application during 
the first LHC run was the search for heavy resonances decaying to hadronic ti searches [3]. For such a 
resonance search the kinematic top tagger in combination with a 6-tag showed a similar performance as the 
usual, approximate reconstruction of semileptonic tt pairs. In this paper we will present a set of improvements 
towards the HEPTopTagger2 for a Z' search at the 13 TeV LHC. Many of these improvements can be 
applied to other LHC processes, as will be discussed in the Appendix. 

In using all available information from a pair of boosted top quarks, event deconstruction is currently giving 
the leading performance estimates for heavy resonance searches m- For the analysis in the main body of 
this paper we will follow the analysis framework of Ref. to eventually allow for a comparison in Sec. |IV| 
For the signal we therefore use PythiaS [53] to generate Z' —)■ tt events with mz' = 1500 GeV at 13 TeV 
collider energy. Assuming the same couplings as for the Standard Model Z-boson would yield a width of 
T{Z') = 47 GeV; to be consistent with the assumed experimental resolution in Ref. m we increase the width 
to 65 GeV and only simulate the vector couplings. However, we will see that this choice of the physical Z' 
width does not affect our results which are based on the reconstructed fat jet kinematics. For the Z' decay 
we assume a 100% branching ratio to top pairs. The two backgrounds are continuum tt production which we 
simulate assuming pr.t > 400 GeV, and QGD di-jet production, also requiring > 400 GeV. Again, we rely 
on PythiaS, keeping in mind that for the pure QGD background our di-jet rate might not be a conservative 
estimate. All top quarks are forced to decay hadronically. Our simulations for the main body of the paper 
include underlying event but do not account for pile-up or detector effects, unless explicitly mentioned. For a 
completely realistic study of the signal and background efficiencies of the new HEPTopTagger2 we will have 
to rely on upcoming experimental studies. For our multivariate tagging analyses we optimize the background 
rejection with respect to the pure QGD background, because it is by far dominant. 


Decay kinematics 

On the analysis level we first select events with at least two fat jets with 

PT.fat > 400 GeV and < 2.5 , (1) 

reconstructed using the G/A algorithm [55] with cone size R = 1.5, as implemented in Fast Jet [53]. We limit 
ourselves to the two hardest fat jets in each event for the Z' search. The corresponding cut flow is given in 
Tab.|^ Using the old default HEPTopTagger setup |S] we find a double top tagging efficiency of e 2 tags = 14% 
in the signal, as shown in Tab. |T] If we apply a fixed invariant mass window mu G [1200,1600] GeV on the 
tagged and reconstructed top quarks, the Z' tagging efficiency is Sz' = 10.2%. For the tt background we find 
mis-tagging probabilities of e 2 tags = 13.7% and ez' = 3.3%. For the QGD background sample the double 
mistag rates are £ 2 tags = 6.6 • 10“'* and £z' = 1.5 • 10“^. The QGD jets background exceeds the continuum 
top pair production by a factor five after all cuts. 



Z' -I tt 

tt 

QGD 

generator level 

10® 

10® (1.76 pb) 

8 • 10® (1.93 nb) 

> 2 fat jets Eq.Q 

69142 

85284 (1.50 pb) 

6.7- 10® (1.62 nb) 

hardest 2 fat jets HTTjJHEPlOlO] tagged 

9679 

11706 (0.21 pb) 

4426 (1.07 pb) 

mu € [1200,1600] GeV 

7031 

2817 (0.05 pb) 

978 (0.24 pb) 


Table I: Number of events and the corresponding PythiaS cross section used for our analysis. The efficiencies es,B 
for a Z' extraction are defined as the ratio of the last to the second line in this table. 
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Figure 1: Left: ROC curves for the dominant QCD background vs. the Z' signal after including additional kinematic 
information shown in Eq. <§• As in all figures the asterisk corresponds to the original HEPTopTagger described in 
Ref. [5]. Right: |Ai/| distribution of the reconstructed top quarks for signal and backgrounds. 


A straightforward improvement of the basic analysis shown in Tab. IT] should be to replace the mass window 
by a boosted decision tree (BDT) analysis, as implemented in Tmva [^, based on the reconstructed invariant 
mass mtt- In the left panel of Fig. we first show the results as receiver operator characteristic (ROC) curve, 
correlating the best signal and background efficiencies based on a given set of kinematic observables. This 
approach has been used to improve and benchmark the general performance of the HEPTopTagger m- 
Because the QCD jet background is dominant we always set up our multivariate analyses based on the Z' signal 
and the QCD background sample. Compared to the working point of the original public HEPTopTagger 
tool [S] with a fixed mass window mtt G [1200,1600] GeV the new HEPTopTagger2 including mtt in a 
multivariate analysis looks slightly worse. The reason is the change in the order in the algorithm described in 
the Appendix. It significantly reduces the background sculpting, but at the expense of background rejection 
for example for a constant signal efficiency. On the other hand, the reduced background sculpting removes 
a major source of systematic uncertainty when we need to interpret an mu distribution which shows a peak 
which could be due to a signal or to a sculpted background. Moreover, it turns out that the difference between 
the old and new taggers vanishes once both of them are used in a fully flexible multivariate framework. 

For a better discrimination between signal and background we should include additional variables in our 
multivariate analysis. The deterministic structure of the HEPTopTagger will still allow for a particularly 
clear separation of the actual tagging and reconstruction from a subsequent kinematic analysis based on the 
reconstructed top momenta. The first additional variable we include is the rapidity difference between the 
two reconstructed top quarks, |A?/|. The corresponding signal and background distributions are shown in the 
right panel of Fig. While this variable might not be too efficient in removing the tt continuum background, 
events are visibly less central for QCD jets. The differences can hardly be translated into efficient kinematic 
cuts, but they will help as part of a multivariate analysis. In the left panel of Fig. we show the corresponding 
improvement in terms of ROC curves. In particular for low signal efficiencies es < 0.1 we find a significant 
reduction of the background fake rates, going beyond the working point of the first HEPTopTagger. 

An obvious extension of our set of kinematic observables are the transverse momenta of the reconstructed 
top quarks. Note that as part of the ROC analysis we do not have to ensure that the different kinematic 
variables are independent of each other, which would be problematic for a combination of mtt and the pT,t 
distributions. Again, the improvement from the transverse momentum spectra is shown in the left panel of 
Fig.[T] All this illustrates that the kinematic information on the tagged and reconstructed tops can increase 
the background rejection by 50% to 100% for fixed signal tagging efficiency. We also see that once we include 
the top-pair invariant mass and the transverse momenta, the additional improvement from |A?/| vanishes, 
because the 2-particle final state is essentially fully described. As kinematic observables in our multivariate 
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Figure 2: ROC curves for different combinations of initial-state jet radiation (ISR) and final-state jet radiation (FSR) 
in the Z' signal generation. The background is QCD with ISR and FSR for all curves. 


analysis we choose 


{ rntt,PT,ti,PT,t 2 } (decay kinematics). 


( 2 ) 


QCD jets 

In purely hadronic searches for new physics, QCD effects beyond fixed order are a major issue in trying 
to theoretically understand the signal and backgrounds. Before we devise strategies to deal with final-state 
radiation and initial-state radiation in heavy-resonance searches we can estimate their effect on the naive 
tagger-based analysis. 

On the Monte Carlo level it is possible to separately remove initial-state radiation and final-state radiation 
from all signal events. For the QCD jets background this is not sensible, because we need both mechanisms 
to generate a sufficient jet multiplicity. The ROC curves in Fig. show the expected improvements in the 
absence of additional signal jets. We see that the leading effect spoiling the signal extraction is final-state 
radiation (FSR). Initial-state radiation (ISR) affects top tagging in two ways. First, the additional QCD jets 
can mimic for example the softer kF-decay jet and degrade the tagging efficiency through combinatorics. On 
the other hand, ISR jets recoil against the Z', affecting the pt spectrum of the top quarks. In particular the 
tagging of the softer top decay can benefit from this recoil, which means that for large signal efficiency the 
results without ISR become worse than those with all jet activity included. 

As a whole, the results shown in Fig. [^indicate potentially significant improvements of top taggers when we 
target the different effects of QCD jet radiation. We wifi show in the following subsection how a deterministic 
top tagger is limited by final-state radiation and how the new HEPTopTagger2 can avoid these issues. 
Combinatorial problems related to initial-state radiation will then be one of the key topics in Sec. m 


Final-state radiation 

Final-state radiation (FSR) turns one of the key advantages of our top tagger into a significant problem: 
unlike some other top tagging approaches, the HEPTopTagger returns the 4-momentum of the tagged 
top, including a cut on the reconstructed top mass nirec € [150,200] GeV [15]. This allows us to trivially 
reconstruct mz'- Final-state radiation off the top decay products will be captured by the jet clustering and 
contribute to the correct filtered top mass value mi. This way it will not pose a problem as long as the Z' 
decays to on-shell tops. 
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m,, [GeV] 


Figure 3: Effect of final-state radiation on the invariant mass of the tagged and reconstructed tt system mu for the 
Z' signal (left) and different approaches to reconstruct the Z' mass peak (right). Monte Carlo truth is \/p%i with an 
assumed width of 65 GeV. 


However, if the Z' decays to slightly off-shell tops, which turn themselves into on-shell tops, this final-state 
radiation off the intermediate top mis-aligns the actual Z' with the Z' as reconstructed from the top quarks 
at the moment they decay. Because the hard radiated gluon does not enter the top reconstruction, the top 
tag will pass, but lead to an underestimated mz' value. In the left panel of Fig. we indeed see that the 
mu distribution for the top-tagged signal correctly peaks around mz>, but develops a sizeable asymmetric 
tail towards smaller mu values. While the details of this asymmetric tail from PythiaS should be subject to 
a detailed Monte Carlo study, we simply confirm that turning off final-state radiation by hand gets rid of it 
almost entirely. The remaining slight broadening as well as a minimal tail towards smaller mu values is due 
to small losses in the top 4-momentum reconstruction of the tagger. At higher values of mz' the asymmetric 
tail is further enhanced. 

The problem with large asymmetric tails from final-state radiation is that they cannot simply be corrected 
for in a universal top tagger. The basic structure of the HEPTopTagger has to identify and reconstruct 
top quarks, rather than the decay products of a heavy Z' resonance. Therefore, we do not modify the actual 
tagger, but we account for final-state radiation through an additional set of kinematic observables. 

Following the brief discussion above, including the kinematics of the fat jet in addition to the reconstructed 



nipeak [GeV] 

r [GeV] 

-±150 

^Z' 


1 

-*-/^QCD 

mtt € [1200,1600] GeV 

- 

- 

0.136 

22 

2805 

unfiltered 

1539 

167 

0.141 

21 

1960 

A = 0.3, A = 4 

1457 

152 

0.146 

28 

2218 

A = 0.3, A = 5 

1477 

144 

0.150 

25 

2098 

A = 0.3, A = 6 

1489 

139 

0.151 

25 

2052 

A = 0.3, A = 7 

1496 

144 

0.151 

24 

2043 

A = 0.2, A = 5 

1443 

140 

0.141 

29 

2329 

A = 0.3, A = 5 

1477 

144 

0.150 

25 

2098 

R = 0.4, A = 5 

1500 

144 

0.151 

24 

2030 

A = 0.5, A = 5 

1515 

143 

0.148 

23 

1993 

pruning 2 = 0.1, /a = 0.5 

1443 

150 

0.138 

26 

2075 


Table II: Breit-Wigner fits and performance of different grooming approaches. The quoted efficiencies are based on a 
window for the invariant mass of the two filtered fat jets |mff — mz' \ < 150 GeV. 
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Figure 4: Reconstructed mass distribution of the Z' signal and the backgrounds based on the tagged tops (left) and 
the corresponding filtered fat jets (right). 


top 4-moinentum should remove the broad asymmetric tail in the reconstructed mz' values. Again, we first 
select events with two tagged tops, including the top mass condition. Instead of using the 4-momenta of 
the tagged tops, we now reconstruct the Z' mass from the 4-momenta of the two fat jets of size R — 1.5, 
which eventually lead to the top tags. In the presence of underlying event and initial-state radiation the 
naive ms distribution peaks roughly at the correct Z' mass and shows symmetric tails. To use the invariant 
mass of the two fat jets we need to apply filtering m- In the right panel of Fig. we compare the filtered 
invariant mass from the two fat jets m and its pruned value m, both as implemented in Fast Jet [53]. As 
a reference we also show the mu distribution from the left panel of the same figure. Unlike the reconstructed 
mu distribution, both, the filtered and the pruned mg distributions give symmetric peaks around the correct 
mz' value. 


To be able to use the filtered mg values in our HEPTopTagger analysis we confirm that filtering and 
pruning give stable numerical results for the invariant mass of the two fat jets. Results for different parameter 
settings are listed in Tab. jllj We give the peak positions, which would be subject to a proper calibration, the 
fitted Breit-Wigner widths for the symmetric peaks, and the tagging performances for a fixed mass window 
\mg — mz'\ < 150 GeV. Replacing the Breit-Wigner width with a Gaussian would make no difference, but 
give a poorer modelling of the tails. Typical widths of the reconstructed Z' mass peak will range around 
145 GeV, roughly twice the assumed particle width of 65 GeV. Even in the absence of detector effects, this 
resolution will replace the assumed particle width of 65 GeV in all of the following analysis. The constant 
numbers in Tab. |lT| confirm that the mg criterion is stable for different filtering parameters as well as pruning. 

On the other hand, the results shown in Tab. jlT] also indicate that simply replacing the mu window by a 
filtered mg value will not improve the Z' extraction. In Fig. we show that the steeply falling QGD jets 
background now has a maximum around me = 1.3 TeV, while for the reconstructed top quarks there exists 
a much more pronounced maximum around mu = 900 GeV. The reason is that top tagging removes events 
with many hard QGD jets in two steps: first requiring the correct top mass value from three assumed top 
decay products, and second when applying the Z' mass window. If we remove the first step, the second one 
has to deal with larger backgrounds at high mg values. 

If we want to include final-state radiation and at the same time benefit from its additional information, we 
need to keep mg as well as mu in our analysis, and not apply a simple mass window on the mu distribution. 
The kinematics of the Z'-decay is then described by 

{ mu , mg, pT,ti , PT,t 2 , PT,h , Pt.12 } (filtered fat jets). (3) 

All default settings of the HEPTopTagger are listed in the Appendix. We filter the fat jets using R — 0.3 
and keep the N = 5 hardest substructures. In the left panel of Fig. [^we show the corresponding ROC curves. 
Unlike in the rest of the paper we study the ti and QGD jets backgrounds separately. The improvement of 


































































Figure 5: Left: performance of the multivariate analysis including the information on the fat jet, as given in Eq.([^, 
Eq.([^ and Eq.(^. Only in this plot do we optimize for tt and QCD backgrounds separately. Right: performance 
curve for the full analysis only accounting for the dominant QCD jet background. 


the full multivariate tagger including the fat jet information of Eq.([^ is obvious for both backgrounds. In 
the right panel of Fig. we first show the same improvement, but using a BDT trained on the QCD jets 
background only. Compared to the original HEPTopTagger we achieve an improvement of up to a factor 
2 in I/eb for constant signal efficiency. We note that for the QCD background the combination of mis- 
tagged top kinematics and fat jet kinematics goes beyond the description of the hard process. For example 
initial-state radiation, sensitive to the color structure of the signal and the background, will be captured in 
this combination of observables. On the other hand, because the fat jets are defined using the standard jet 
algorithms and show a stable filtering performance, we do not envision major experimental problems provided 
pile-up subtraction works as well as expected. 

The set of kinematic observables listed in Eq. @ still relies on the deterministic HEPToptagger output. 
This means that the identification of a Z' signal event is limited by the efficiency of two top tags. The choice 
of a working point in the top tagging algorithm will therefore limit our over-all efficiency. On the other hand, 
we already know that for hadronic Z' searches the QCD jets background is dominant and will only be reduced 
through a combination of top tags and Z' mass reconstruction. 

In addition, we omit a fixed mass window for the reconstructed top mass m-cec- Instead, we widely open 
the top mass and W-mass constraints in the tagging algorithm. For each of the tops the corresponding rurec 
value then becomes an output of the tagger. We provide the multivariate Z' analysis with the smaller and 
larger of these two output Wrec values, which we label as and respectively. Similarly, we avoid 

a fixed window for the ratio of the W-mass to the top mass, parametrized as fw in the tagging algorithm. 
Its deviation from the true value is given by the value of free defined in the Appendix. In the multivariate 
analysis we include the maximum of the two free values corresponding to each tagged top. 

{ } (variable masses). (4) 

The result is shown in the right panel of Fig.[^ where the range of accessible efficiencies eventually extends to 
56%. Altogether, the analysis based on the set of kinematic variables shown in Eq.Q gives us an improvement 
of up to a factor 5 in background rejection for a constant Z'-signal efficiency. 


III. UPDATED TAGGER 

Fat jets with a geometric size of i? = 1.5 or even R = 1.8 have shown to be powerful new analysis objects 
at the LHC. The radius of the fat jet is directly related to the energy or boost of the heavy particles which 
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can be captured. This means that a multi-purpose top tagger will be based on as large fat jets as possible. 
However, to realize their potential such large jets require additional treatment linked to their large geometric 
size. Without a dedicated analysis step, underlying event and pile-up will almost entirely wash out any 
structure inside the fat jet. Filtering m as an integral part of all versions of the HEPTopTagger 015 ] 
effectively reduces the geometric size of the fat jet used to reconstruct the top 4-momentum by introducing 
a second clustering stage with higher resolution. This solves the problem with underlying event and pile-up, 
but there remains a combinatorial problem caused for example by initial-state radiation. In particular the 
softer of the two subjets from the W-decay can easily be faked by a typical QCD jet inside the fat jet. 
This will lead to a wrong reconstruction of the top 4-momentum, which we can only counter by applying 
harder tagging requirements and hence reducing the tagging efficiency. These so-called type-2 tags [2Q| , where 
only two of three top decay jets can be identified with a parton-level decay quark have been in the focus of 
HEPTopTagger studies at moderate boost [T511^ IMj . In the reconstruction of heavy resonances we can 
solve the problem of (too) large fat jets by adapting the size of the fat jet to the kinematics of the tagged 
top. It turns out that this adaptive size of the fat jet also gives us another powerful kinematic variable for 
the multivariate analysis. Finally, we will show how this optimalR modification of our tagging algorithm can 
be further improved by including A^-subjettiness variables. 


OptimalR mode 


There have been different attempts to adjust the size of the fat jet for example based on the transverse 
momentum of the fat jet mini EH, but none them lead to a dramatic effect in the performance of taggers. 
We instead choose a purely algorithmic way of determining the minimum size of the fat jet [d5j . Assuming 
that three top decay jets are captured by the fat jet we can run the standard HEPToptagger algorithm 
to determine the top mass from the three leading subjets [15]. For a large fat jet size, typically R = 1.5 
or R = 1.8, we compute a reference value of Wrec, which should be around the top mass. In the usual 
tagging algorithm, this computation of rurec from filtered subjets takes into account final-state radiation off 
the on-shell top. We then reduce the size of the fat jet in steps of AR — 0.1 and compute the corresponding 
values of mj-edR)- In case of several possible triplets, this includes the step of choosing the one closest to the 
physical top mass, as described in step (5) in the Appendix. As a function of the decreasing jet size R the 
fat jet mass mrec(R) will form a stable plateau, until the reduced fat jet will be too small to capture all three 
top decay jets. At this point TOrec(R) will leave the plateau and show a significant drop. For R = 1.5, which 
is sufficient for the Z' mass in our study, we define this drop through 


( 1 . 5 ) 

TTT-rec 


- nir, 




( 1 . 5 ) 

miec 


> 0.2 




R < R, 


opt ■ 


(5) 


Once the shrinking fat jet passes this condition we go back one step to the last R value on the plateau and 
define this value as i?opt- The smallest value we allow in this study is i?opt = 0.5, but for pT^t d 1 TeV this 
value can be adjusted in the tagger setup. This value could be a challenge of the calorimeter resolution, so 
the corresponding results are subject to tests based on a full detector simulation in ATLAS and in CMS. In 
this paper we typically arrive around i?opt = 0.6. The tagging result for this i?opt value will be the output of 
the top tagger. 


Measuring i?opt defines another useful variable for the top tagger, because we can also predict i?opt from 
the fat jet kinematics. A similar reasoning is used in the original HEPTopTagger algorithm, where a 
consistency condition on the reconstructed top momentum pT^t > 200 GeV ensures that the reconstructed 
top can actually be captured in the fat jet. In the optimalR mode we first determine the transverse momentum 
of the filtered fat jet, pT,t as described in the previous section. Including up to ten hardest subjets after a 
filtering step with Rgn = 0.2 turns out to give the best estimate of for this purpose. Reducing this number 
to five subjets has no measurable effect on the width of the reconstructed pT,f distribution, but slightly shifts 
its maximum to smaller values |1{5] . The final number will be subject to an independent optimization in 
ATLAS and CMS. 

For pT^i > 200 GeV we derive a closed form by fitting a function oc I/pT,f to simulated data, as 

described in the Appendix. The kinematic variables in our the multivariate tagger now read 


{ mtt , ms, pT,ti, PT,t2 , PT.h , Pt, f2 , m™di /, 


f max 
J Tec 


JD _ 

-^opt -^opt 


(optimalR). (6) 
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Figure 6: Performance of the optimalR mode based on the kinematic variables in Eq. including A^-subjettiness 
variables as defined in Eq.(|^, and including Qjets. As described in the text, for Qjets we need to require a finite 
calorimeter resolution, while all other curves do not include any detector effects. We only consider the dominant QCD 
background. 


For this case of two top tags we choose i?opt — as the maximum deviation of the tagged tops. In 

this form all subsequent kinematic variables linked to the top tags will be evaluated with the fat jet size 
Ropt- For the Z' search i?opt*'^^ '''^Fl be strongly correlated with other kinematic variables listed in Eq. §. 
We nevertheless include it in the BDT because the general multivariate HEPTopTagger2 described in the 
Appendix will not include the top momenta in the tagging. The increase of the tagging performance from the 
optimalR mode is shown in the left panel of Fig.[^ While for small signal efficiencies the curves for optimalR 
and for the variable mass setup of Eq.Q are identical within numerical fluctuations, we observe a significant 
improvement for larger signal efficiencies. 


N-subjettiness 

The arguably simplest question we can ask as part of a top tagger is the number of hard subjets inside 
the fat jet with a given jet mass. This number of subjets can be defined through an observable similar to 
event shapes like for example thrust, called fV-subjettiness [32l[33]. It is based on N reference axes which 
are required to match the k hard substructures, 

^ -y^PT,fcmin(Ai?i_fc, AR 2 .fc,-• • , , (7) 

Rol^kPT,k ^ 

where ^Rij is the geometric separation between the axis i and the substructure k. In this form A-subjettiness 
parametrizes the deviation of the energy flow away from N jets not only related to an integer number of 
subjets, but also reflecting the color structure and the related radiation pattern. 

In terms of original definition |d2| we fix the exponent to ,0 = 1. i?o is an intrinsic cone size, chosen 
such that tn < 1. Small values of rjv —)• 0 indicate that the complete substructure is described by N axes, 
indicating that there are at most N relevant substructures. The ratio T^/T]y-_i will therefore become small 
for a fat jet with N hard subjets. For top tagging the ratio T 3 /T 2 will be most useful and can even be used as 
a tagger itself. Higher tn values will contribute to a multivariate analysis of A-subjettiness, describing the 
jet radiation pattern around the assumed three partonic top decay momenta. 

We will use A-subjettiness as an additional variable in our multivariate HEPTopTagger. Originally, this 
combination did not lead to a significant improvement when added to the A-shaped cuts m- However, when 
we open the cut f\y on the reconstructed ratio mw I’fUt we observe a significant improvement for the extended 
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set of kinematic variables. The complete set of relevant kinematic variables, now including A'^-subjettiness 
variables before and after filtering, is 


{ nitt , TOff, PT,ti , PT,t2 I Pr,fi, PT,i2 J J /i 


rmax 
/ rec 


jj n(calc) (filt) (filt) 

! rtopt .CtQpj , f 


(fV-subjettiness). 

( 8 ) 


For more details on the A^-subjettiness variables we refer to the Appendix. As in Eq. ®) all kinematic 
variables linked to the top tag will be evaluated with the fat jet size i?opt- The details of implementation of 
the A^-subjettiness variables is discussed in the Appendix. 


Qjets 

The main limitation even of the deterministic multivariate HEPTopTagger is the aim to identify a 
unique set of subjets from the top decay as part of the tagging procedure, which allows us to reconstruct the 
4-momentum of the tagger top and for example compare it to Monte Carlo truth. If the kinematic selection 
identifies a wrong set of subjets as the best candidates for the top decay products, an actual top decay can 
easily fail the tagging procedure. To avoid this loss in signal efficiency we can allow for more than one set of 
candidate subjets to be tested. One approach that not only covers several candidates of subjet combinations, 
but which even allows for a statistical analysis of many such assignments is Qjets [^ . 

During the clustering of the fat jet the standard recombination algorithms combine the closest set of pre-jets 
according to a given measure. For the C/A algorithm this measure is the geometric separation dij — 
of the pre-jets i and j. Qjets generalizes this deterministic choice to a likelihood measure. For each pair of 
pre-jets {i,j) it computes the weight 


(a) / 

^ij - exp -a 


fj. ■ — 

Jmin 

% 



(9) 


and then chooses the two pre-jets to cluster according to a random number trailing the weights For this 
study we choose a = 0.1, to balance the convergence of the algorithm with our aim of generating alternative 
subjet assignments for the top tagger. The standard jet algorithm corresponds to the limit a —>■ oo. The 
global weight for a clustering history is defined as 




n 


(a) 


UJ. 


mergings 


n 

mergings 



Jmin 

% 


a. 


consistent 


( 10 ) 


The universal limiting case —>■ 1 for a perfect clustering history indicates that in searching for the largest 

global weight the choice of a should not make a major difference. The Qjets clustering procedure can be 
repeated many times, where in this study we typically rely on 100 clustering histories. They can be ranked by 
their global weights instead of the independent local weights used by a deterministic jet algorithm. For 
each history we apply the unclustering and top tagging algorithm. As long as the deterministic jet algorithm 
picks a reasonable merging history for a signal event we expect the outcome of the deterministic tagger and 
the tagger acting on the clustering history with the highest global weight to be close. 

The first advantage of Qjets appears when during an early clustering step the deterministic measure 
dij identifies the wrong merging in the sense that the remaining history cannot be described well by QCD. 
This deterministic history will by definition receive the maximum global weight = 1. However, an 

alternative history in better agreement with QCD could reach a similarly large global weight. Because Qjets 
provides many alternative clustering histories, we can search for a set of top tags with comparably large global 
weights. For example, we can use the two positively tagged Qjets histories with the highest global weight 
in the multivariate analysis. This way, a possibly misleading deterministic result is corrected. This should 
improve the performance in particular when we enforce high signal efficiencies, where the tagger becomes most 
vulnerable to a wrong clustering input. It turns out that already this simple modification gives a sizeable 
improvement in the signal efficiency. 
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The second improvement to the usual top tagger is based on HEPTopTagger output for the full set of 100 
clustering histories. First, we include the fraction of positive top tags based on the default HEPTopTagger 
settings among all 100 Qjets histories, eqjets; as introduced in the Appendix. Next, we extract statistical 
information from distributions of the Qjets histories, like for example the reconstructed top mass mrec- 
This distribution is defined for eqjets x 100 histories. Signal events will strongly peak around the top mass 
with a possible secondary peak around the VF-mass. QCD background events will instead show a smooth 
decrease. The two most relevant observables in the Wrec distribution are the mean and the variance of this 
reconstructed top mass distribution with 100 entries, symbolically denoted as 


Our multivariate analysis we base on the second approach. We start with the top-tagged Qjets history 
with the highest global weight and run the tagging algorithm of this history only. In addition, we include the 
statistical information of the rurec distribution of the subset of the 100 Qjets histories which defines a top 
candidate. The complete list of observables including the Qjets information now reads 


{ mu, me, pT,ti, Ptm > PT,h , PT.fa, w 


min max 
rer, ? 


rmax p _ r \ _min r Qjets\ \ 

1 ./ ro/^ 1 •> I ' /V I 1 ^ OioFo 1 1 '' ^■rar' I I 


(Qjets), (11) 


where {tat} represents the appropriate set of filtered and unfiltered A^-subjettiness variables (for example 
= 1, 2, 3 for each of the two tops). For the two tags in the Z' analysis we choose the smaller eqjets value of 
the two. All variables from the tagger are evaluated for the optimized R size and the clustering history with 
the largest global weight. 


In Fig. 1^ we show the effect of the Qjets histories in addition to the other improvements. A key difference 
between the previous discussion and the Qjets approach is that we now need to include some kind of detector 
resolution, to limit Qjets to a manageable number of significantly different merging histories. For that reason 
we divide the calorimeter into rj x (j) cells of size 0.1 x 0.1 and pre-cluster the entire set of calorimeter entries 
before applying any jet algorithm. Because this detector resolution effect is not included for the previous 
results, the Qjets ROC curve does not consistently exceed the A^-subjettiness curve without Qjets. On 
the other hand, we still observe the expected improvement towards large signal efficiencies. The moderate 
drop at small signal efficiencies gives us confidence that a full detector simulation will not lead to significant 
degradation of our results. 


IV. FULL EVENT INFORMATION 

Going back to the discussion in Sec. |ll]the remaining question is how the new HEPTopTagger2 per¬ 
formance compares to other approaches designed for the upcoming LHC run. The benchmark for such a 
comparison is event deconstruction, or more specifically the projections for a Z' resonance search [17j . As 
mentioned in our discussion of jet radiation in Sec[H]the borders between the hard process or the Z' decay 
on the one side and QCD jet radiation and its sensitivity to the signal and background color structure on 
the other side are washed out when we include for example filtered subjets or V-subjettiness information. 
We therefore start with a brief discussion of the additional information from jets in the entire event and then 
move on to the comparison with the leading benchmark in proposed Z' analyses. 


Additional jets 

To determine to what degree the jet structure of purely hadronic Z' —>• ti events helps the extraction of 
the signal from the ti and QCD jets background we first study the number and kinematic distribution of 
small C/A jets with R = 0.2 and pxj > 10 GeV in addition to the fat jets fulfilling Eq. 0 . We choose 
these very small jets in order to test information which might be available from so-called microjets in shower 
deconstruction. Our discussion should not be applied to an LHC analysis one-to-one and is instead aimed at 
capturing as much information as possible. Without any major cuts, the number of jets will consist of three 
decay jets per top quark, FSR jets, and ISR jets. For an inclusive event sample, we should be able to tell 
apart the different processes from the number of jets and the kinematics of the individual jets [3S]. 

After a first level of cuts we see in Fig. that the Z' signal and the ti background both peak at 10 
microjets, e.g. four jets from ISR and FSR combined. For the background the number is slightly larger. 
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Figure 7: Information on the hardest jet before top tagging (upper row) and the hardest jet left over after top tagging 
(lower row). For the jets defined with R = 0.2 and pr > 10 GeV we show the number of jets, the hardest jet’s 
transverse momentum, and its mass in Z' candidate events (left to right). 


because we generate the scale of the hard process also through a large number of jets. We also see that the 
transverse momentum of the hardest jet is slightly larger for the signal. We could include these jet patterns in 
a multivariate analysis, but at this stage this information would be very heavily correlated with the variables 
from the top tagger. 

In a second step we focus on the jet activity which does not contribute to the top tagging. Inside the fat jets 
we know that the top tagger includes information based on subjets with typically R = 0.3 and pr ^ 20 GeV 
after filtering. After two tags we then remove all calorimeter data associated with the filtered triplet of either 
of the top candidates and re-cluster the remnants into microjets with R — 0.2 and pxj > 10 GeV. In the 
lower panels of Fig. we see how after removing the signal decay jets the remaining number of jets peaks 
around two ISR or FSR jets. For the QGD background this number is higher, because it takes a larger number 
of equally distributed jets in the detector to fake a boosted massive top inside each fat jet. The transverse 
momentum of the hardest of the remaining QGD jets also peaks at very small values for the signal and the 
ti background, as one would expect for example for a small number of ISR jets. The bulk of the hardest 
QGD jets per event shows transverse momenta around ppj = 50 — 200 GeV, still small compared to the hard 
scale imprinted on the multi-jet background through the kinematic selection of Eq.Q. We should be able to 
use this additional information for our BDT analysis, to improve the signal extraction. In the right panel of 
Fig. [^we see the corresponding ROC curve. It turns out that almost all of the information available through 
the extra jet radiation is already included in our combined analysis of top tags and subjet kinematics. 

Based on this piece of information we assume that additional jet information inside and outside the fat jets 
hardly changes the stable results of the updated top tagger, so we can compare the new HEPTopTagger 2 
to other multivariate methods. 
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Figure 8: Comparison of the multivariate HEPTopTagger 2 analysis presented in this paper with the event decon¬ 
struction approach of Ref. m- All HEPTopTagger 2 curves correspond to Fig. but now with a collider energy 
of 14 TeV instead of 13 TeV, This comparison in the absence of an experimental validation should be taken as first 
estimate. 


Comparison with other approaches 

The most promising projections for boosted top identification and specifically searches for ti resonances 
during the upcoming LHC runs are available for shower deconstruction |16j or event deconstruction m 
This method is based on a construction of likelihoods representing possible shower histories for a jet or a 
fat jet. The underlying objects are so-called C/A [32 [33] microjets with R = 0.2 and pr > 10 GeV [IT] . 
They are slightly softer and smaller than the subjets in a typical top tagger, but we have seen that the 
additional information from those jets should not make a big difference. Unlike general template methods, 
shower deconstruction relies on the soft and/or collinear approximation of QCD to compute the likelihood 
of a given shower history in terms of splitting probabilities and Sudakov factors (non-splitting probabilities). 
Based on the possible shower histories the likelihood ratio of a fat jet coming from a boosted top quark or 
from the QCD jet background acts as a measure for the top tag. One problem with shower deconstruction, 
like any probabilistic approach, is that we cannot separate the identification and the reconstruction of the 
boosted top quark. This means we cannot for example show the quality of the reconstructed 4-momentum 
compared to Monte Carlo truth. 

The Z' analysis using event deconstruction starts with two fat jets of size R = 1.5 and the acceptance cuts 
given in Eq.Q. The number of microjets is limited to 9 per fat jet. In addition to the likelihood separating 
the top or QCD origin of each of the two fat jets, the event likelihood measure now also includes a likelihood 
describing the resonant or non-resonant production of the pair of fat jets given their 4-momenta. At the level 
of the hard process this part is not very different from the established matrix element method |37| and largely 
replaces an analysis of the rritt and pT,t distributions dehning the multivariate analysis of Eq.® . In Ref. m 
the observable width of the rritt resonance is assumed to range around 65 GeV, an assumption we follow. In 
our analysis the precise resolution for example after detector effects only plays a secondary role, because the 
resolution of the HEPTopTagger2 is limited to 145 GeV, as shown in Tabjllj 

In Fig. j^we show the performance of the analysis developed in this paper with the recent benchmark of 
event deconstruction. One difference to the HEPTopTagger results shown in Fig. j^is that we now show 
Z' efficiencies up to 68%, confirming that Qjets indeed gives us a major improvement for very large signal 
efficiencies. Another difference is that for a direct comparison we now assume a collider energy of 14 TeV. 
Both, event deconstruction and the new HEPTopTagger show a comparable performance for the upcoming 
run. The final answer on both methods will only be given by experimental studies including data. 
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V. CONCLUSION 

We demonstrated how the updated HEPTopTagger2 performs in searches for Z' bosons or other heavy 
resonances decaying to top pairs in the upcoming LHC run. Based on the original HEPTopTagger we 
modify the tagging algorithm and add several additional kinematic variables to a multivariate analysis: 

- fat jet kinematics to account for final-state radiation in resonance searches; 

- algorithmically optimized size of the original fat jet combined with its prediction (optimalR mode); 

- Wsubjettiness probing the more general sub jet structures inside the fat jet; 

- Qjets with a global picture of the most likely clustering histories giving a top tag. 

Each of these improvements can be added to the top tagging individually. For the specific Z' resonance 
search we altogether achieve an increase of the background rejection by a factor of 30 for a constant Z'-signal 
efficiency of 10%. Compared to the original tagger the background sculpting in the invariant mass of 
the top pair is significantly reduced m- These updated results are at least competitive with the leading 
estimates for other tagging methods. 

Because the multivariate Z' analysis includes several layers of improvement, not necessarily linked to the 
actual top tagging, we also show in the Appendix the corresponding improvements for top tagging in tt events. 
There, we test the updated tagger for moderate {pT,t > 200 GeV) and sizeable {pT,t > 600 GeV) boost and 
find a significant improvement in particular for larger boost. The limiting factor for moderate boost still is 
capturing all three top decay jets inside a fat jet, which has to be targeted by a dedicated low-pr mode PO] . 
The corresponding HEPTopTagger2 described in the Appendix will be made publicly available [51 [SS]- hr 
particular for Qjets there exist different modes which need to be tested on data. 

Comparing the improvement of the Z' analysis with that in the individual top tags shows that the benefits 
for the full Z' case are significantly larger than those just from the top tags. A lesson from this is that it is 
useful to consider the optimization of top tagging, not only in its own right, but also in the context of full 
search analyses. 
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Appendix: HEPTopTagger2 

In the past it has proven useful to publish details about the HEPTopTagger algorithm. We describe 
the new structure reflecting all changes in Refs. liiiiiiin] in this Appendix. Because the main body of the 
paper is focused on the performance in resonance searches we then present benchmark results based on purely 
hadronic tt events in the Standard Model. They can be directly translated for example into semi-leptonically 
decaying tt pairs. Finally, the enhanced capabilities of the HEPTopTagger2 have lead to enough of a 
complexity of the actual code that we briefly describe the run modes, the input parameters, and the available 
output information from the tagger. 


Algorithm 


The basic HEPTopTagger 2 algorithm largely follows the original algorithm described in Ref. [5], but is 
based on FastJetS [53] and includes a number of new features: 


1. define a C/A fat jet with i?fat = 1-8 and determine the splitting history through the default clustering. 

2 . identify all hard subjets using a mass drop criterion: undo the last clustering of the jet j, into two 
subjets ji, j 2 with nij^ > nij^; require < /drop mj with /drop = 0.8 to keep both; otherwise, keep 
only ji; further decompose or add each subjet ji to the list of relevant substructures. A global soft 
cutoff rrij. > TOmin = 30 GeV can be adjusted^. 

3. iterate through all triplets of three hard subjets: filter them with resolution Rfm = min(0.3, Ai?j^/2); 
use the Nfm = 5 hardest filtered constituents and calculate their combined jet mass; re-cluster these 
five subjets into three assumed top decay jets; reject all triplets outside TO 123 = rurec & [150,200] GeV; 
keep the event if at least one such triplet exists. For the multivariate analysis this window is opened to 
TOrec < 1 TeV, which allows us to use rUrec as a kinematic output of the tagger. 

This set of re-clustering and filtering steps by default uses the C/A jet algorithm [55]. However, to 
guarantee infrared safety and enhance the performance at large boosts |5D] it can be switched to /ct 
jets [23] . 

4. order the three subjets /i,j 2,/3 by pr', if the masses (mi 2 , mi 3 , 77123 ) satisfy one of the following three 
criteria, accept them as a top candidate: 


0.2 < arctan < 1.3 


mi2 


in , ™23 , n 

and Amin ^ ^ Rmax 
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R ■ 1 

*^min I 


^ < 1 _ 

\ml 2 J J \ml 23 J 

^ < 1 _ 
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<Rt 


l-f 


[ Z!!/ 

\'rni3 


and 


and 


77123 

771123 

77123 

771123 


> 0.35 


> 0.35 


( 12 ) 


where i?min,max = (1 T ]w)fnwI'^t defines the parameter /ly, by default set to jw = 0.15. The soft 
cutoff 77123 > 0.35 771123 cis well as the limits [0.2,1.3] in the first line can be adjusted. All kinematic 
cuts are listed in Tab. |V] and can be adapted in a multivariate approach. In the multivariate case we 
open the VF-mass window to /w = 0.3. The ratio of the IV-mass to the top mass can then be used as 
a kinematic output defined as 


/rec = min 
b 


771»j 

771123 

mw 

rrit 


(13) 


^ We have checked that replacing the mass drop criterion with a soft drop criterion m does not improve the performance of 
the tagger noticeably. 
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5. of all triplets passing the above criteria in a given fat jet choose the one with mi 23 = Wrec closest to 

This selection has shown to be the most efficient, and applying it after all kinematic cuts minimizes 
the background sculpting. The Wrec and /^ec values supplied to the multivariate analysis are those 
corresponding to this triplet. 

6 . for consistency, require the reconstructed pT,t to exceed 200 GeV. 

7. in the low-pr mode [5D] reduce this threshold to pT,t > 150 GeV; compute the Fox-Wolfram mo¬ 
ments [23 


N 

E wt, Piicosn^j) 

with Wfj = and = ^ . (14) 

uZpti) 


of the subjets relative to each other and relative to the reconstructed top momentum. This mode is not 
part of the usual tagger and relies on external GSL libraries [39] for Legendre polynomials. 


8 . in the optimalR mode repeat steps 1 to 3 with a decreasing fat jet radius in steps of Ai? = 0.1; based 
on the condition mlec — ^rec > 0.2mrec determine the minimum radius i?opt > 0.5; follow steps 4 
to 6 with this modified fat jet. We also parametrize the expected value for i?opt in terms of pt,( based 
on the numerical simulation of the top decay kinematics illustrated in Fig. 


p(calc) 

-^opt 


327 

PT,f 


(15) 


9. in the V-subjettiness mode m compute the Tj [32] as defined in Eq. Q from the filtered and unfiltered 
subjets, as described below. Again, this mode is not part of our tagger code and relies on the Fast Jet 
CONTRIB [211 [35] add-on for iV-subjettiness [5^ . 


10. in the Qjets mode replace the deterministic output of step 1 by a set of possible histories defined in 
Eq.(lO); run the tagger for each of them, giving a set of clustering histories with global weights 11, and 
a positive or negative tagging result. 



Figure 9: fit based on Standard Model tt samples with pT,t > 200, 400, 600 GeV for the parton level distance 

of decay products Rbjj- The fat jets are filtered with R — 0.2, N — 10. The functional form of the 6t curve is given 
in Eq.([l5l. 
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Figure 10: Performance of the HEPTopTagger 2 for tt production in the Standard Model. We show the incremental 
improvements from the extended multivariate analyses for top quarks with pT,t > 200 GeV and pT,t > 600 GeV. 


Following this description the low-pp (7) and 7V-subjettiness (9) modes simply add kinematic observables 
to the tagger output. These observables can be included in a multivariate analysis or can be cut on in the 
deterministic top tagging decision. The improvement in the low-pp mode is illustrated in detail in Ref. |15j 
while the impact of fV-subjettiness variables on the resonance search is illustrated in Fig. 

In contrast, the optimalR mode and the Qjets mode modify the clustering histories (1) underlying the 
mass drop search (2). Depending on the modified fat jet size or on the Qjets weight they return a set of 
tagging outputs. For the optimalR mode it is straightforward to choose the smallest reasonable fat jet size 
i?opt for the actual tagging. The Qjets histories can be evaluated in a range of possible ways. 


Performance 

The main body of this paper focuses on ti resonance searches using the HEPTopTagger described above. 
While the combination of tagged top kinematics and fat jet kinematics in Sec|^ does not directly translate 
into to a universal top tagger, the multivariate aspects discussed in Sec.|III[ namely optimalR, iV-subjettiness, 
and Qjets do. Here, we show efficiencies for extracting tt events from the QCD multi-jet background. 

Our analyses are based on fully hadronic tt signal and QCD dijet background samples generated with 
PythiaS [21] . For the general top tagger analysis in this Appendix we include underlying event in the event 
generation and mimic the limited detector resolution by clustering the hadronic activity into rj x (p cells of 
size 0.1 X 0.1, similar to the Qjets results shown in Fig. ^ Instead of the hard acceptance cuts in Eq.Q we 
now allow for softer fat jets. Two multivariate BDT analyses focus on ti samples with 

PT.fat > 200 GeV lyfatl < 2.5 pT,t > 200,600 GeV , (16) 

where the top momenta are evaluated on the Monte Carlo truth level. We select events with fat C/A jets of 
radius i?fat = 1.8 and |?/fat| < 2.5 constructed with FastJet. 

Background efficiencies Eb are defined as relative to the number of those fat jets. For the signal efficiencies 
we require that the fat jets can be matched to a parton level top quark within AR < 0.8. Using the original 
version of the HEPTopTagger |2j we find for the px > 600 GeV samples a signal efficiency of Es = 35.6% 
and a mis-tagging rate Eb = 2.7%. The first change in the algorithm addresses the signal efficiency and 
background sculpting. In the original algorithm the triplet of subjets closest to the true top mass is selected 
and only later the mass plane cuts are applied. Therefore, the tagger will fail if this triplet does not pass the 
mass plane constraints and no alternative triplet is analyzed. To eliminate this limitation, we first apply the 
mass plane constraints and then pick the triple closest to the top mass, as described above. 
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As in the main text we study further improvements of the tagger based on ROC curves. To allow for 
such improvements we loosen the cuts of the tagger to rurec < 1 TeV and fw = 0.3. The initial set of BDT 
parameters in analogy to Eq.Q is 

{wrec/rec} (variable masses). (17) 


The large cone size of i? = 1.8 is not always appropriate, so the optimalR mode optimizes the radius of each 
fat jet. Starting from the initial cone size we stepwise reduce the size of the fat jet until the criterion Eq.([^ 
indicates that we miss a top decay jet. For the last stable R size we run the usual tagging algorithm. We can 
calculate the expected value for the critical radius based on the transverse momentum of the filtered 

fat jet. For a fat jet originating from a top decay this prediction should agree with the measured value, while 
for a background fat jet the two are only strongly correlated when the entire subjet kinematics is a perfect 
match to a top decay. For the optimalR mode we set up a BDT analysis with the observables 


{ rUi-ec, /rec: Ropt R< 


(calc) 

opt 


} 


(optimalR). 


(18) 


All tagging observables are evaluated for a fat jet with size i?opt- In Fig. 10 we show the improvement from 
the optimized size of the fat jet. Obviously, it is more impressive for largeFboost, while for pT,t > 200 GeV 
the optimalR mode hardly leads to a reduction in fat jet size. 


The Wsubjettiness variables are best applied independently for fat jets which would pass and would not 
pass the initial tagging criterion. The optimalR working point 


Wrec e [150,200] GeV 


free < 0.175 


Ropt R, 


(calc) 

opt 


< 0.3 , 


(19) 


which corresponds to the signal efficiency Ss = 0.22(0.27) in Fig. 10 


defines these two categories. Fat 
jets passing Eq.(19) can be assumed to include a complete set of top decay products and are filtered with 
= 0.2 and = 5; fat jets failing this criterion are instead filtered with = 0.3 and = 3. The 

unfiltered V-subjettiness variables Ti defined in Eq.Q and their filtered counter parts are included 

up to f < 3. The reference axes are chosen as fc^-axes. We then set up two independent BDTs with 
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( 20 ) 


and later combine them into one ROG curve. This precise condition is represented by the more generic Eq.([^. 
In Fig. f^we show the corresponding ROC curves for a successively improved tagger. 


Finally, we can replace the deterministic clustering history from the usual jet algorithm with a set of Q jets 
histories with large global weights 0^“^ defined in Eq.(lO) for a = 0.1. This way we avoid cases where the 
deterministic clustering history entering the top tagging algorithm is misled during the independent evaluation 
of splittings in the usual jet algorithm. When dehning jets as analysis objects for a hard process this does 
not pose a problem, but for subjet analyses it can have an effect. 

Our analysis is based on 100 Qjets histories per fat jet. In Tab. im] we show their signal and background 
efficiency if required to lead to individual top tags. As the reference value we use the default HEPTopTagger 
with fixed mass windows. Based on 100 Qjets histories we then define the fraction eqjets of histories which 
lead to a top tag with the default tagging setup. We see that for moderately boosted tops the deterministic 
signal tagging efficiency can be reproduced by requiring 30% of the Qjets histories to deliver a positive tag. 
The corresponding mis-tag probability is slightly reduced compared to the deterministic tagger. For harder 
tops the corresponding value is around eqjets > 20%, with no improvement in the background rejection. 


As discussed in Sec. |III| QjETS offers two strategies to improve the top tagger. To maximize the improvement 
in the tagging performance and to limit the GPU time we base the multivariate analysis on the tagged history 
with the largest global weight. As additional parameters we include the value of eqjets as well as the mean 
and variance of the m^ec distribution with the 100 Qjets entries, symbolically denoted as For the 

BDT analysis the variables are 


{ TOrec7 /rec, Ropt “ i?ipfTOfat 7 Eqjets, } (QJETS) 


( 21 ) 
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Figure 11: Performance of the HEPTopTagger2 for tt production in the Standard Model. For pT,t > 200 GeV and 
PT,t > 600 GeV we we focus on different Qjets setups, based on a more basic multivariate tagger without optimalR 
and V-subjettiness. 


As usual, all variables from the tagger are evaluated for the optimized R size and the clustering history with 


the largest global weight. The additional improvement is shown in Fig. 10 


Because Qjets offers a variety of improvements to the tagger, we study different setups based on the stage 
with multivariate mass windows in Fig. 11 We start by replacing the deterministic C/A output with the most 
likely Qjets history and including eqjets in the multivariate analysis. This leads to a moderate improvement 
of the tagger at large transverse momenta and at large signal efficiencies. Adding the statistical information 
from the eqjets x 100 entries in the rurec information leads to a sizeable improvement over a wide range of 


signal efficiencies. This is the mode we use for the Z' analysis as well as in Fig. 10 

Next, we add the second-best Qjets history to the tagger, such that the multivariate tagger (including 
Sqjets) is free to construct a criterion based on one or two tags in the two best Qjets histories. For most of 
the ROC curves this comparably simple approach is as successful as the full statistical information. Finally, 
adding the statistical information on the rrirec distribution leads to a mild improvement. 



tt 

QCD 

default HTT 

0.337 

0.0212 

S^Qjets ^ 0.1 

0.435 

0.0318 

^Qjets ^ 0.2 

0.384 

0.0231 

£Qjets 0.3 

0.341 

0.0174 

S^Qjets ^ 0.4 

0.298 

0.0123 

S^Qjets ^ 0.5 

0.250 

0.0089 

£Qjets ^ 0.6 

0.212 

0.0064 

S^Qjets ^ 0.7 

0.163 

0.0036 

S^Qjets ^ 0.8 

0.118 

0.0021 

S^Qjets ^ 0.9 

0.064 

0.0007 



tt 

QGD 

default HTT 

0.465 

0.0489 

S^Qjets ^ 0.1 

0.524 

0.0661 

^Qjets ^ 0.2 

0.447 

0.0461 

S^Qjets ^ 0.3 

0.388 

0.0342 

S^Qjets ^ 0.4 

0.336 

0.0245 

^Qjets ^ 0.5 

0.281 

0.0168 

S^Qjets ^ 0.6 

0.236 

0.0118 

S^Qjets ^ 0.7 

0.181 

0.0062 

S^Qjets ^ 0.8 

0.133 

0.0032 

S^Qjets 0.9 

0.069 

0.0009 


Table III: Tagging efficiencies for pr > 200 GeV (left) and pr > 600 GeV (right), eqjets is defined as the number of 
Qjets tags per number of Qjets runs. For this table we test 10.000 fat jets with 100 Qjets iterations. 





















21 


name 

EARLYJ1ASSRATI0_S0RT_MASS 


description 

apply the 2D mass plane requirements, then select the candidate which 
minimizes |mcand — rnt| 


LATE_MASSRATIO_SORT_MASS 

EARLY_MASSRATIO_SORT_MODDJADE 

LATE_MASSRATIO_SORT_MODDJADE 

TWO_STEP_FILTER 


select the candidate which minimizes |mcand — mtl 

apply the 2D mass plane requirements, then select the candidate with 

the highest modified Jade distance 

select the candidate with the highest modified Jade distance 

only analyze the candidate built with the highest pT,t after unclustering 


Table IV: HEPTopTagger working modes. 


Interface 


To apply the HEPTopTagger algorithm to a fat C/A jet constructed with FastJetS [21], the only 
necessary steps are executing the default constructor HEPTopTagger (fast j et:: Pseudo Jet jet) followed 
by running the tagger using void run(). This will analyze the fat jet using the optimalR procedure with 
the default settings given in Tab. jVj The available operation modes are shown in Tab. [TV] All configurable 
parameters are listed in Tab. [^ Functions to retrieve results are presented in Tab. VI 

QHTTO sets up the Qjets mode. It is applied to a fully configured HEPTopTagger by void 
run (HEPTopTagger htt). All configurable parameters are given in Tab. |VH| A list of functions to access the 
results is presented in Tab. |VHI| 

In addition, we provide a framework for the calculation of Fox-Wolfram moments that relies on an existing 
installation of GSL [39]. While the constructor FWM(vector<fastjet: :PseudoJet> jets) allows the cal¬ 
culation of Fox-Wolfram moments for a given set of jets, FWM(HEPTopTagger htt, unsigned selection) 
uses the b, Wi, and W 2 momenta from the HEPTopTagger run and calculates the Fox-Wolfram moments 
in the top rest frame. The boost axis a itself can be included m- Subsets of these four vectors can be set 
via unsigned selection, as a sequence of 0 or 1 in the order abWiW 2 . In Tab. IX we show how to extract 
the Fox-Wolfram moment of a given order of the Legendre polynomials. 

Finally, we include an example class LowPtO for a fixed low-px mode working point returning a tagging 
decision including the set low-px mode by is_tagged(HEPTopTagger). 
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name 

default 

description 

general: 





do_optimalR(bool ) 

true 

use optimalR approach 

unclustering: 





set_mass_drop_threshold (double) 

0.8 

mass drop threshold 


set_max_subjet_mass(double) 

30 

max subjet mass for unclustering 

filtering: 





set_filtering_R (double) 

0.3 

max subjet distance for filtering 

set_filtering_n (unsigned) 

5 

max subjet number for filtering 

set_filtering_minpt_sub j et(double) 

0. 

min subjet pr for filtering 

set_f ilter ing_j et algor ithm( 
fastjet::JetAlgorithm) 

cambridge_algorithm 

jet algorithm for filtering 

reclustering: 





set_reclustering_jetalgorithm( 
fastjet::JetAlgorithm) 

cambridge_algorithm 

jet algorithm for reclustering 

candidate selection: 





set_mode(enum) 

EARLY_MASSRATIO_SORT_MASS 

run mode, see Tab. 

IV 


setjnt (double) 

172.3 

true top mass 



set_mw(double) 

80.4 

true W mass 



set_top_mass_range (double, double) 

150, 200 

top mass window 



set _fw (double) 

0.15 

width of A-shaped bands fw 

set_mass_ratio_range( 
double, double) 

(1 — fw) mw/mt = 0.397 
(1 + fw) mw/rrit = 0.537 

width of cut in 2D mass plane 

set_mass_ratio_cut (double, 

0.35, 0.2, 1.3 

boundaries in 2D mass 

plane 

double, double) 





set_top_minpt(double) 

200 

min pT,t consistency cut 

pruning: 





set_pruning_zcut(double) 

0.1 

Zcut for pruned mass rUprune 

s et _pruning_r cut _f act or ( double ) 

0.5 

fcut for pruned mass rriprune 

optimalR: 

set _optimalRjnax(double) 

size of the input fat jet 

max jet size 



set _opt imalRjiin ( double ) 

0.5 

min jet size 



set _optimalR_st ep(double) 

0.1 

step size (multiple of 0.1) 

set_optimalR_threshold(double) 

0.2 

optimalR mass threshold 

calculation of : 





set_f ilter ing_optimalR_calc_R (double) 

0.2 

max subjet distance for filtering 

set_f ilter ing_optimalR_calc_n (unsigned) 

10 

max subjet number for filtering 

set _opt imalR_c al c _f un (double 
(*f)(double)) 

327/pT,fiit 

dependency of on pr.fiit 

optimalR type: 





set_optimalR_type_top _mass_range (double 
double) 

150. 200. 

mass range for optimalR type 1 

set_optimalR_type_f _rec (double) 

0.175 

max /rec for optimalR type 1 

set_optimalR_type_max_dif f (double) 

0.3 

max Ropt — for optimalR type 1 

A^-subjettiness: 





set_f ilter ing_optimalR_pass_R (double) 

0.2 

Rfiit for optimalR type 1 

set_f ilter ing_optimalR_pass_n (unsigned) 

5 

Aflit optimalR type 1 


set_f ilter ing_optimalR_fail_R (double) 

0.3 

Rflit for optimalR type 0 

set f ilter ing optimalR fail n (unsigned) 

3 

Aflit for optimalR type 0 


Table V: Additional parameters of the HEPTopTagger algorithm. All functions have a return type of void. 
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name 

description 

bool is_maybe_top() 

top mass window requirement passed? 

bool is_masscut_passed() 

2D mass plane requirements passed? 

bool is_minptcut_passed() 

candidate pT,t threshold passed? 

bool is_tagged() 

top mass window, 2D mass plane requirement, and pT,t 
threshold passed? 

double delta_top() 

mrec - mt 

double djsumO 

modified Jade distance 

double prunedjnass0 

pruned top mass 

double unfilteredjnass0 

mass of the triplet of subjets after unclustering before 
hltering 

double f_rec() 

minimal \ {mij j m^ec) j {mw ! mt) — 1| 

const PseudoJet & t() 

top candidate 4-vector 

const PseudoJet & b() 

subjet corresponding to the b 

const PseudoJet & W() 

combined subjets corresponding to the W 

const PseudoJet & Wl() 

leading subjet from the W 

const PseudoJet & W2() 

sub-leading subjet from the W 

const std::vector<PseudoJet> & top_subjets() 

three subjets from the top, ordered: b, Wi, W 2 

const PseudoJet & jl() 

leading subjet 

const PseudoJet & j2() 

sub-leading subjet 

const PseudoJet & j3() 

sub-sub-leading sub jet 

const std::vector<PseudoJet> & top_hadrons() 

all top constituents 

const std: :vector<PseudoJet> & hardpartsO 

hard subtructures after unclustering, sorted by pr 

const PseudoJet & fat_inital() 

original fat jet (after Qjets reclustering) 

const PseudoJet & fat_Ropt() 

fat jet reduced to Ropt 

void get_setting() 

print settings to stdout 

void get_info() 

print tagger information to stdout 

HEPTopTagger HTTagger(unsigned i) 

HEPTopTagger candidate for a distance parameter 
R — i/10. By default all functions above return val¬ 
ues a.t R = Ropt- This function accesses candidates for 
different values of R. 

double RoptO 

Ropt 

double Ropt_calc() 

p(calc) 

“opt 

int optimalR_type 0 

result of set optimalR working point. 1 = pass, 0 = fail 

double nsub_unfilteredCint order, 
fastjet :: contrib::Njettiness::AxesMode axes = 
fastjet::contrib::Njettiness::kt_axes, double 
beta = 1 ., double RO = 1 . ); 

V-subjettiness n for the unfiltered fat jet 

double nsub_filteredCint order, 

V-subjettiness t^*'*’**^ for the fat jet after filtering depend¬ 

fastjet::contrib::Njettiness::AxesMode axes = 
fastjet::contrib::Njettiness::kt_axes, double 
beta = 1., double RO = 1.); 

ing on optimalR_type(). 

double q_weight() 

weight of used Qjets history 


Table VI: Functions to retrieve results of the HEPTopTagger algorithm. 



24 


name 

default 

description 

set.iterations(unsigned) 

100 

number of Qjets iterations 

set_q_zcut (double) 

0.1 

tcut for pruning in Qjets 

set.q.dcut _fctr (double) 

0.5 

Dcut factor for pruning in Qjets 

set_q_exp(double a, double b) 

0., 0. (C/A) 

set distance measure for Qjets 

d^j = mm{pT,i,PT,j)°‘ max{pT,i,PT,j)‘’ Rtj 

set _q_rigidity (double) 

0.1 

rigidity a for Qjets 

set_q_truncation_fctr (double) 

0. 

threshold for merging probability uJij in Qjets 


Table VII: Parameters of the Qjets frame for the HEPTopTagger. All functions have a return type of void. 


name 

description 

HEPTopTagger leadingO 

HEPTopTagger with leading tagged history 

HEPTopTagger subleading() 

HEPTopTagger with subleading tagged history 

double weight.leadingO 

Qjets weight of the leading tagged history 

double weight.subleading0 

Qjets weight of the subleading tagged history 

double eps.qO 

SQjets 

double mjneanO 

(m) for the tagged histories 

double m2jnean() 

(m^) for the tagged histories 


Table VIII: Functions to retrieve results of the Qjets frame. 


name 

description 

double U(unsigned) 

double Pt(unsigned, 

fast jet: : PseudoJet=(0., 0., 1., 0.)) 

FWM of given order with unit 

FWM of given order with pt 
vector. 

weight 

weight relative to the given reference 


Table IX: Functions to retrieve Fox-Wolfram moments. 
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